Exploiting Intra-Annotator Rating Consistency Through Copeland's Method for Estimation of Ground Truth Labels in Couples' Therapy
نویسندگان
چکیده
Behavioral and mental health research and its clinical applications widely rely on quantifying human behavioral expressions. This often requires human-derived behavioral annotations, which tend to be noisy, especially when the psychological objects of interest are latent and subjective in nature. This paper focuses on exploiting multiple human annotations toward improving reliability of the ensemble decision, by creating a ranking of the evaluated objects. To create this ranking, we employ an adapted version of Copeland’s counting method, which results in robust inter-annotator rankings and agreement. We use a simple mapping between the ranked objects and the scale of evaluation, which preserves the original distribution of ratings, based on maximum likelihood estimation. We apply the algorithm to ratings that lack a ground truth. Therefore, we assess our algorithm in two ways: (1) by corrupting the annotations with different distributions of noise, and computing the inter-annotator agreement between the ensemble estimates derived from the original and corrupted data using Krippendorff’s α; and (2) by replacing one annotator at a time with the ensemble estimate. Our results suggest that the proposed method provides a robust alternative that suffers less from individual annotator preferences/biases and scale misuse.
منابع مشابه
Inferring ground truth from multi-annotator ordinal data: a probabilistic approach
A popular approach for large scale data annotation tasks is crowdsourcing, wherein each data point is labeled by multiple noisy annotators. We consider the problem of inferring ground truth from noisy ordinal labels obtained from multiple annotators of varying and unknown expertise levels. Annotation models for ordinal data have been proposed mostly as extensions of their binary/categorical cou...
متن کاملUsing community structure detection to rank annotators when ground truth is subjective
Learning using labels provided by multiple annotators has attracted a lot of interest in the machine learning community. With the advent of crowdsourcing cheap, noisy labels are easy to obtain. This has raised the question of how to assess annotator quality. Prior work uses bayesian inference to estimate consensus labels and obtain annotator scores based on expertise; the key assumptions are th...
متن کاملMomresp: A Bayesian Model for Multi-Annotator Document Labeling
Data annotation in modern practice often involves multiple, imperfect human annotators. Multiple annotations can be used to infer estimates of the ground-truth labels and to estimate individual annotator error characteristics (or reliability). We introduce MOMRESP, a model that improves upon item response models to incorporate information from both natural data clusters as well as annotations f...
متن کاملAn Expectation Maximization Approach to Joint Modeling of Multidimensional Ratings Derived from Multiple Annotators
Ratings from multiple human annotators are often pooled in applications where the ground truth is hidden. Examples include annotating perceived emotions and assessing quality metrics for speech and image. These ratings are not restricted to a single dimension and can be multidimensional. In this paper, we propose an Expectation-Maximization based algorithm to model such ratings. Our model assum...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کامل